Hi! Sometimes I watch movies and sometimes I write code. Here is a random collection of graphs from a data analysis on a movie dataset.

Nerd stuff: The dataset is from the GroupLens website, a site for comparing and getting recommendations on movies, found at: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset.

An SQL database was created from this dataset. Queries were run using SQL, and results were processed using R. Figures were plotted using Plotly in R.

Budget vs Rating

Top-rated Movies

Popularity of Genres

In this graph the number of times a movie is tagged with a certain genre. Note that one movie can have multiple tags. You can click to show or hide genres. Judging the results, it is obvious the dataset has some flaws. Many movies lack genre tags, especially older movies. One way to solve this would be to correct for the total amount of movies released each year - which you can do by clicking Total movies released. Alternatively, we can take a genre count as a ratio of all genres - which you can do by selecting the stacked area chart from the second tab Stacked area.

Line chart

Stacked area